Robust speech recognition using VAD-measure-embedded decoder
نویسندگان
چکیده
In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard the speech/non-speech classification information after voice detection. This paper proposes an approach that uses the speech/non-speech information to adjust the score of the recognition hypotheses. Experimental results show that our approach can improve the accuracy significantly and reduce computational consumption by combining the frontend method.
منابع مشابه
VAD-measure-embedded decoder with online model adaptation
We previously proposed a decoding method for automatic speech recognition utilizing hypothesis scores weighted by voice activity detection (VAD)-measures. This method uses two Gaussian mixture models (GMMs) to obtain confidence measures: one for speech, the other for non-speech. To achieve good search performance, we need to adapt the GMMs properly for input utterances and environmental noise. ...
متن کاملA Low-Cost Robust Front-end for Embedded ASR System
In this paper we propose a low-cost robust MFCC feature extraction algorithm which combines noise reduction and voice activity detection (VAD) for automatic speech recognition (ASR) system of embedded applications. To remedy the effect of additive noise a magnitude spectrum subtraction method is used. A VAD is performed to distinguish speech signal from noise signal. It discriminates speech/non...
متن کاملA Hybrid Hmm/traps Model for Robust
We present three voice activity detection (VAD) algorithms that are suitable for the off-line processing of noisy speech and compare their performance on SPINE-2 evaluation data using speech recognition error rate as the quality metric. One VAD system is a simple HMM-based segmenter that uses normalized log-energy and a degree of voicing measure as raw features. The other two VAD systems focus ...
متن کاملA hybrid HMM/traps model for robust voice activity detection
We present three voice activity detection (VAD) algorithms that are suitable for the off-line processing of noisy speech and compare their performance on SPINE-2 evaluation data using speech recognition error rate as the quality metric. One VAD system is a simple HMM-based segmenter that uses normalized log-energy and a degree of voicing measure as raw features. The other two VAD systems focus ...
متن کاملRobust Speech Recognition in a Car Using a Microphone Array
Performance of automatic speech recognition relies on a vast amount of training speech data mostly recorded with little or no background noise. The performance degrades significantly with existence of background noise, which increases type mismatch between train and test environments. Speech enhancement techniques can reduce the amount of type mismatch by extracting reliable speech features fro...
متن کامل